Sketching Aggregates over Probabilistic Streams
نویسنده
چکیده
The datastream model of computation has proven a valuable tool in developing algorithms for processing large amounts of data in small space. This survey examines an extension of this model that deals with uncertain data, called the probabilistic stream model. As in the standard setting, we are presented with a stream of items, with no random access to the data. However, each item is represented by a probability distribution function, allowing us to model the uncertainty associated with each element. We examine the computation of several aggregates in the probabilistic stream setting, including the frequency moments of the stream, average, minimum, and quantiles. The key difficulty in these computations is the fact that the stream represents an exponential number of possible worlds, and even simple numbers like the length of the stream can be different in different possible worlds. Obtaining accurate, reliable estimates can be very non-trivial.
منابع مشابه
Estimating Aggregate Properties on Probabilistic Streams
The probabilistic-stream model was introduced by Jayram et al. [16]. It is a generalization of the data stream model that is suited to handling \probabilistic" data where each item of the stream represents a probability distribution over a set of possible events. Therefore, a probabilistic stream determines a distribution over potentially a very large number of classical \deterministic" streams...
متن کاملSketching Streams Through the Net: Distributed Approximate Query Tracking
Emerging large-scale monitoring applications require continuous tracking of complex dataanalysis queries over collections of physicallydistributed streams. Effective solutions have to be simultaneously space/time efficient (at each remote monitor site), communication efficient (across the underlying communication network), and provide continuous, guaranteed-quality approximate query answers. In...
متن کاملSketch-based Querying of Distributed Sliding-Window Data Streams
While traditional data-management systems focus on evaluating single, adhoc queries over static data sets in a centralized setting, several emerging applications require (possibly, continuous) answers to queries on dynamic data that is widely distributed and constantly updated. Furthermore, such query answers often need to discount data that is “stale”, and operate solely on a sliding window of...
متن کاملPOISketch: Semantic Place Labeling over User Activity Streams
Capturing place semantics is critical for enabling location-based applications. Techniques for assigning semantic labels (e.g., “bar” or “office”) to unlabeled places mainly resort to mining user activity logs by exploiting visiting patterns. However, existing approaches focus on inferring place labels with a static user activity dataset, and ignore the visiting pattern dynamics in user activit...
متن کاملA Simple and Efficient Estimation Method for Stream Expression Cardinalities
Estimating the cardinality (i.e. number of distinct elements) of an arbitrary set expression defined over multiple distributed streams is one of the most fundamental queries of interest. Earlier methods based on probabilistic sketches have focused mostly on the sketching algorithms. However, the estimators do not fully utilize the information in the sketches and thus are not statistically effic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008